keywords: Data transformation, two-phase sampling, correlation level, linearity, extreme observation
Two-phase sampling for regression has proven to be efficient in estimation especially when there is high correlation coefficient between the study and the auxiliary variable(s). However, the presence of extreme value makes the distribution to violate the basic statistical assumptions. Violation of linearity assumption by the concerned distribution, among other assumptions, may lead to type-I or type-II error in Two-phase sampling for regression. This study applied non-linear data transformation to the study variable and/or auxiliary variables. It was confirmed that data transformation is an efficient empirical tool to correct the effect of linear assumption violation in Survey Statistical Inference. However, when such data transformation is applied only to the auxiliary variable (that is, not transforming the study variable) even in the presence of high correlation coefficient, less efficient estimate would be obtained. It was concluded that simultaneous application of data transformation on both the study and auxiliary variables rather than correlation coefficient should be the condition for selecting efficient estimator in Two-phase sampling in Survey Statistical Inference.